Search Results: "cjwatson"

11 December 2016

Colin Watson: The sad tale of CVE-2015-1336

Today I released man-db 2.7.6 (announcement, NEWS, git log), and uploaded it to Debian unstable. The major change in this release was a set of fixes for two security vulnerabilities, one of which affected all man-db installations since 2.3.12 (or 2.3.10-66 in Debian), and the other of which was specific to Debian and its derivatives. It s probably obvious from the dates here that this has not been my finest hour in terms of responding to security issues in a timely fashion, and I apologise for that. Some of this is just the usual life reasons, which I shan t bore you by reciting, but some of it has been that fixing this properly in man-db was genuinely rather complicated and delicate. Since I ve previously advocated man-db over some of its competitors on the basis of a better security posture, I think it behooves me to write up a longer description. I took over maintaining man-db over fifteen years ago in slightly unexpected circumstances (I got annoyed with its bug list and made a couple of non-maintainer uploads, and then the previous maintainer died, so I ended up taking over both in Debian and upstream). I was a fairly new developer at the time, and there weren t a lot of people I could ask questions of, but I did my best to recover as much of the history as I could and learn from it. One thing that became clear very quickly, both from my own inspection and from the bug list, was that most of the code had been written in a rather more innocent time. It was absolutely riddled with dangerous uses of the shell, poor temporary file handling, buffer overruns, and various common-or-garden deficiencies of that kind. I spent several years reworking large swathes of the codebase to be more robust against those kinds of bugs by design, and for example libpipeline came out of that effort. The most subtle and risky set of problems came from the fact that the man and mandb programs were installed set-user-id to the man user. Part of this was so that man could maintain preformatted cat pages , and part of it was so that users could run mandb if the system databases were out of date (this is now much less useful since most package managers, including dpkg, support some kind of trigger mechanism that can run mandb whenever new system-level manual pages are installed). One of the first things I did was to make this optional, and this has been a disabled-by-default debconf option in Debian for a long time now. But it s still a supported option and is enabled by default upstream, and when running setuid man and mandb need to take care to drop privileges when dealing with user-controlled data and to write files with the appropriate ownership and permissions. My predecessor had problems related to this such as Debian #26002, and one of the ways they dealt with them was to make /var/cache/man/ set-group-id root, in order that files written to that directory would have consistent group ownership. This always struck me as rather strange and I meant to do something about it at some point, but until the first vulnerability report above I regarded it as mainly a curiosity, since nothing in there was group-writeable anyway. As a result, with the more immediate aim of making the system behave consistently and dealing with bug reports, various bits of code had accreted that assumed that /var/cache/man/ would be man:root 2755, and not all of it was immediately obvious. This interacted with the second vulnerability report in two ways. Firstly, at some level it caused it because I was dealing with the day-to-day problems rather than thinking at a higher level: a series of bugs led me down the path of whacking problems over the head with a recursive chown of /var/cache/man/ from cron, rather than working out why things got that way in the first place. Secondly, once I d done that, I couldn t remove the chown without a much more extensive excursion into all the code that dealt with cache files, for fear of reintroducing those bugs. So although the fix for the second vulnerability is very simple in itself, I couldn t get there without dealing with the first vulnerability. In some ways, of course, cat pages are a bit of an anachronism. Most modern systems can format pages quickly enough that it s not much of an issue. However, I m loath to drop the feature entirely: I m generally wary of assuming that just because I have a fast system that everyone does. So, instead, I did what I should have done years ago: make man and mandb set-group-id man as well as set-user-id man, at which point we can simply make all the cache files and directories be owned by man:man and drop the setgid bit on cache directories. This should be simpler and less prone to difficult-to-understand problems. I expect that my next substantial upstream release will switch to --disable-setuid by default to reduce exposure, though, and distributions can start thinking about whether they want to follow that (Fedora already does, for example). If this becomes widely disabled without complaints then that would be good evidence that it s reasonable to drop the feature entirely. I m not in a rush, but if you do need cat pages then now is a good time to write to me and tell me why. This is the fiddliest set of vulnerabilities I ve dealt with in man-db for quite some time, so I hope that if there are more then I can get back to my previous quick response time.

3 December 2016

Ross Gammon: My Open Source Contributions June November 2016

So much for my monthly blogging! Here s what I have been up to in the Open Source world over the last 6 months. Debian Ubuntu Other Plan for December Debian Before the 5th January 2017 Debian Stretch soft freeze I hope to: Ubuntu Other

3 May 2016

Raphaël Hertzog: My Free Software Activities in April 2016

My monthly report covers a large part of what I have been doing in the free software world. I write it for my donators (thanks to them!) but also for the wider Debian community because it can give ideas to newcomers and it s one of the best ways to find volunteers to work with me on projects that matter to me. Debian LTS I handled a new LTS sponsor that wanted to see wheezy keep supporting armel and armhf. This was not part of our initial plans (set during last Debconf) and I thus mailed all teams that were impacted if we were to collectively decide that it was OK to support those architectures. While I was hoping to get a clear answer rather quickly, it turns out that we never managed to get an answer to the question from all parties. Instead the discussion drifted on the more general topic of how we handle sponsorship/funding in the LTS project. Fortunately, the buildd maintainers said they were OK with this and the ftpmasters had no objections, and they both implicitly enacted the decision: Ansgar Burchardt kept the armel/armhf architectures in the wheezy/updates suite when he handled the switch to the LTS team, and Aur lien Jarno also configured wanna-build to keep building armel/armhf for the suite. The DSA team did not confirm that this change was not interfering with one of their plans to decommission some hardware. Build daemons are a shared resource anyway and a single server is likely to handle builds for multiple releases. DebConf 16 This month I registered for DebConf 16 and submitted multiple talk/BoF proposals: I want to share the setup we use in Kali as it can be useful for other derivatives and also for Debian itself to help smooth the relationship with derivatives. I also want to open again the debate on the usage of money within Debian. It s a hard topic but we should really strive to take some official position on what s possible and what s not possible. With Debian LTS and its sponsorship we have seen that we can use money to some extent without hurting the Debian project as a whole. Can this be transposed to other teams or projects? What are the limits? Can we define a framework and clear rules? I expect the discussion to be very interesting in the BoF. Mehdi Dogguy has agreed to handle this BoF with me. Packaging Django. I uploaded 1.8.12 to jessie-backports and 1.9.5 to unstable. I filed two upstream bugs (26473 and 26474) for two problems spotted by lintian. Unfortunately, when I wanted to upload it to unstable, the test suite did not ran. I pinned this down to a sqlite regression. Chris Lamb filed #820225 and I contacted the SQLite and Django upstream developers by email to point them to this issue. I helped the SQLite upstream author (Richard Hipp) to reproduce the issue and he was quick to provide a patch which landed in 3.12.1. Later in the month I made another upload to fix an upgrade bug (#821789). GNOME 3.20. As for each new version, I updated gnome-shell-timer to ensure it works with the new GNOME. This time I spent a bit more time to fix a regression (805347) that dates back to a while and that would never be fixed otherwise since the upstream author orphaned this extension (as he no longer uses GNOME). I have also been bitten by display problems where accented characters would be displayed below the character that follows. With the help of members of the GNOME team, we found out that this was a problem specific to the cantarell font and was only triggered with Harfbuzz 1.2. This is tracked in Debian with #822682 on harfbuzz and #822762 in fonts-cantarell. There s a new upstream release (with the fix) ready to be packaged but unfortunately it is blocked by the lack of a recent fontforge in Debian. I thus mailed debian-mentors in the hope to find volunteers to help the pkg-fonts team to package a newer version Misc Debian/Kali work Distro Tracker. I started to mentor Vladimir Likic who contacted me because he wants to contribute to Distro Tracker. I helped him to setup his development environment and we fixed a few issues in the process. Bug reports. I filed many bug reports, most of them due to my work on Kali: I also investigated #819958 that was affecting testing since it has been reported to Kali as well. And I made an NMU of dh-make-golang to fix #819472 that I reported earlier. Thanks See you next month for a new summary of my activities.

No comment Liked this article? Click here. My blog is Flattr-enabled.

8 April 2016

Colin Watson: No more Hash Sum Mismatch errors

The Debian repository format was designed a long time ago. The oldest versions of it were produced with the help of tools such as dpkg-scanpackages and consumed by dselect access methods such as dpkg-ftp. The access methods just fetched a Packages file (perhaps compressed) and used it as an index of which packages were available; each package had an MD5 checksum to defend against transport errors, but being from a more innocent age there was no repository signing or other protection against man-in-the-middle attacks. An important and intentional feature of the early format was that, apart from the top-level Packages file, all other files were static in the sense that, once published, their content would never change without also changing the file name. This means that repositories can be efficiently copied around using rsync without having to tell it to re-checksum all files, and it avoids network races when fetching updates: the repository you re updating from might change in the middle of your update, but as long as the repository maintenance software keeps superseded packages around for a suitable grace period, you ll still be able to fetch them. The repository format evolved rather organically over time as different needs arose, by what one might call distributed consensus among the maintainers of the various client tools that consumed it. Of course all sorts of fields were added to the index files themselves, which have an extensible format so that this kind of thing is usually easy to do. At some point a Sources index for source packages was added, which worked pretty much the same way as Packages except for having a different set of fields. But by far the most significant change to the repository structure was the package pools project. The original repository layout put the packages themselves under the dists/ tree along with the index files. The dists/ tree is organised by suite (modern examples of which would be stable , stable-updates , testing , unstable , xenial , xenial-updates , and so on). This meant that making a release of Debian tended to involve copying lots of data around, and implementing the testing suite would have been very costly. Package pools solved this problem by moving individual package files out of dists/ and into a new pool/ tree, allowing those files to be shared between multiple suites with only a negligible cost in disk space and mirror bandwidth. From a database design perspective this is obviously much more sensible. As part of this project, the original Debian dinstall repository maintenance scripts were replaced by da-katie or dak , which among other things used a new apt-ftparchive program to build the index files; this replaced dpkg-scanpackages and dpkg-scansources, and included its own database cache which made a big difference to performance at the scale of a distribution. A few months after the initial implementation of package pools, Release files were added. These formed a sort of meta-index for each suite, telling APT which index files were available (main/binary-i386/Packages, non-free/source/Sources, and so on) and what their checksums were. Detached signatures were added alongside that (Release.gpg) so that it was now possible to fetch packages securely given a public key for the repository, and client-side verification support for this eventually made its way into Debian and Ubuntu. The repository structure stayed more or less like this for several years. At some point along the way, those of us by now involved in repository maintenance realised that an important property had been lost. I mentioned earlier that the original format allowed race-free updates, but this was no longer true with the introduction of the Release file. A client now had to fetch Release and then fetch whichever other index files such as Packages they wanted, typically in separate HTTP transactions. If a client was unlucky, these transactions would fall on either side of a mirror update and they d get a Hash Sum Mismatch error from APT. Worse, if a mirror was unlucky and also didn t go to special lengths to verify index integrity (most don t), its own updates could span an update of its upstream mirror and then all its clients would see mismatches until the next mirror update. This was compounded by using detached signatures, so Release and Release.gpg were fetched separately and could be out of sync. Fixing this has been a long road (the first time I remember talking about this was in late 2007!), and we ve had to take care to maintain client/server compatibility along the way. The first step was to add inline-signed versions of the Release file, called InRelease, so that there would no longer be a race between fetching Release and fetching its signature. APT has had this for a while, Debian s repository supports it as of stretch, and we finally implemented it for Ubuntu six months ago. Dealing with the other index files is more complicated, though; it isn t sensible to inline them, as clients usually only need to fetch a small fraction of all the indexes available for a given suite. The solution we ve ended up with, thanks to Michael Vogt s work implementing it in APT, is called by-hash and should be familiar in concept to people who ve used git: with the exception of the top-level InRelease file, index files for suites that support the by-hash mechanism may now be fetched using a URL based on one of their hashes listed in InRelease. This means that clients can now operate like this: This is now enabled by default in Ubuntu. It s only there as of xenial (16.04), since earlier versions of Ubuntu don t have the necessary support in APT. With this, hash mismatches on updates should be a thing of the past. There will still be some people who won t yet benefit from this. debmirror doesn t support by-hash yet; apt-cacher-ng only supports it as of xenial, although there s an easy configuration workaround. Full archive mirrors must make sure that they put new by-hash files in place before new InRelease files (I just fixed our recommended two-stage sync script to do this; ubumirror still needs some work; Debian s ftpsync is almost correct but needs a tweak for its handling of translation files, which I ve sent to its maintainers). Other mirrors and proxies that have specific handling of the repository format may need similar changes. Please let me know if you see strange things happening as a result of this change. It s useful to check the output of apt -o Debug::Acquire::http=true update to see exactly what requests are being issued.

30 March 2016

Colin Watson: Re-signing PPAs

Julian has written about their efforts to strengthen security in APT, and shortly before that notified us that Launchpad s signatures on PPAs use weak SHA-1 digests. Unfortunately we hadn t noticed that before; GnuPG s defaults tend to result in weak digests unless carefully tweaked, which is a shame. I started on the necessary fixes for this immediately we heard of the problem, but it s taken a little while to get everything in place, and I thought I d explain why since some of the problems uncovered are interesting in their own right. Firstly, there was the relatively trivial matter of using SHA-512 digests on new signatures. This was mostly a matter of adjusting our configuration, although writing the test was a bit tricky since PyGPGME isn t as helpful as it could be. (Simpler repository implementations that call gpg from the command line should probably just add the --digest-algo SHA512 option instead of imitating this.) After getting that in place, any change to a suite in a PPA will result in it being re-signed with SHA-512, which is good as far as it goes, but we also want to re-sign PPAs that haven t been modified. Launchpad hosts more than 50000 active PPAs, though, a significant percentage of which include packages for sufficiently recent Ubuntu releases that we d want to re-sign them for this. We can t expect everyone to push new uploads, and we need to run this through at least some part of our usual publication machinery rather than just writing a hacky shell script to do the job (which would have no idea which keys to sign with, to start with); but forcing full reprocessing of all those PPAs would take a prohibitively long time, and at the moment we need to interrupt normal PPA publication to do this kind of work. I therefore had to spend some quality time working out how to make things go fast enough. The first couple of changes (1, 2) were to add options to our publisher script to let us run just the one step we need in careful mode: that is, forcibly re-run the Release file processing step even if it thinks nothing has changed, and entirely disable the other steps such as generating Packages and Sources files. Then last week I finally got around to timing things on one of our staging systems so that we could estimate how long a full run would take. It was taking a little over two seconds per archive, which meant that if we were to re-sign all published PPAs then that would take more than 33 hours! Obviously this wasn t viable; even just re-signing xenial would be prohibitively slow. The next question was where all that time was going. I thought perhaps that the actual signing might be slow for some reason, but it was taking about half a second per archive: not great, but not enough to account for most of the slowness. The main part of the delay was in fact when we committed the database transaction after processing each archive, but not in the actual PostgreSQL commit, rather in the ORM invalidate method called to prepare for a commit. Launchpad uses the excellent Storm for all of its database interactions. One property of this ORM (and possibly of others; I ll cheerfully admit to not having spent much time with other ORMs) is that it uses a WeakValueDictionary to keep track of the objects it s populated with database results. Before it commits a transaction, it iterates over all those alive objects to note that if they re used in future then information needs to be reloaded from the database first. Usually this is a very good thing: it saves us from having to think too hard about data consistency at the application layer. But in this case, one of the things we did at the start of the publisher script was:
def getPPAs(self, distribution):
    """Find private package archives for the selected distribution."""
    if (self.isCareful(self.options.careful_publishing) or
            self.options.include_non_pending):
        return distribution.getAllPPAs()
    else:
        return distribution.getPendingPublicationPPAs()
def getTargetArchives(self, distribution):
    """Find the archive(s) selected by the script's options."""
    if self.options.partner:
        return [distribution.getArchiveByComponent('partner')]
    elif self.options.ppa:
        return filter(is_ppa_public, self.getPPAs(distribution))
    elif self.options.private_ppa:
        return filter(is_ppa_private, self.getPPAs(distribution))
    elif self.options.copy_archive:
        return self.getCopyArchives(distribution)
    else:
        return [distribution.main_archive]
That innocuous-looking filter means that we do all the public/private filtering of PPAs up-front and return a list of all the PPAs we intend to operate on. This means that all those objects are alive as far as Storm is concerned and need to be considered for invalidation on every commit, and the time required for that stacks up when many thousands of objects are involved: this is essentially accidentally quadratic behaviour, because all archives are considered when committing changes to each archive in turn. Normally this isn t too bad because only a few hundred PPAs need to be processed in any given run; but if we re running in a mode where we re processing all PPAs rather than just ones that are pending publication, then suddenly this balloons to the point where it takes a couple of seconds. The fix is very simple, using an iterator instead so that we don t need to keep all the objects alive:
from itertools import ifilter
def getTargetArchives(self, distribution):
    """Find the archive(s) selected by the script's options."""
    if self.options.partner:
        return [distribution.getArchiveByComponent('partner')]
    elif self.options.ppa:
        return ifilter(is_ppa_public, self.getPPAs(distribution))
    elif self.options.private_ppa:
        return ifilter(is_ppa_private, self.getPPAs(distribution))
    elif self.options.copy_archive:
        return self.getCopyArchives(distribution)
    else:
        return [distribution.main_archive]
After that, I turned to that half a second for signing. A good chunk of that was accounted for by the signContent method taking a fingerprint rather than a key, despite the fact that we normally already had the key in hand; this caused us to have to ask GPGME to reload the key, which requires two subprocess calls. Converting this to take a key rather than a fingerprint gets the per-archive time down to about a quarter of a second on our staging system, about eight times faster than where we started. Using this, we ve now re-signed all xenial Release files in PPAs using SHA-512 digests. On production, this took about 80 minutes to iterate over around 70000 archives, of which 1761 were modified. Most of the time appears to have been spent skipping over unmodified archives; even a few hundredths of a second per archive adds up quickly there. The remaining time comes out to around 0.4 seconds per modified archive. There s certainly still room for speeding this up a bit. We wouldn t want to do this procedure every day, but it s acceptable for occasional tasks like this. I expect that we ll similarly re-sign wily, vivid, and trusty Release files soon in the same way.

2 December 2015

Colin Watson: SSH SHA-2 support in Twisted

Launchpad operates a few SSH endpoints: bazaar.launchpad.net and git.launchpad.net for code hosting, and upload.ubuntu.com and ppa.launchpad.net for uploading packages. None of these are straightforward OpenSSH servers, because they don t give ordinary shell access and they authenticate against users SSH keys recorded in Launchpad; both of these are much easier to do with SSH server code that we can use in library form as part of another service. We use Twisted for several other tasks where we need event-based networking code, and its conch package is a good fit for this. Of course, this means that it s important that conch keeps up to date with the cryptographic state of the art in other SSH implementations, and this hasn t always been the case. OpenSSH 7.0 dropped support for some old algorithms, including disabling the 1024-bit diffie-hellman-group1-sha1 key exchange method at run-time. Unfortunately, this also happened to be the only key exchange method that Launchpad s SSH endpoints supported (conch supported the slightly better diffie-hellman-group-exchange-sha1 method as well, but that was disabled in Launchpad due to a missing piece of configuration). SHA-2 support was clearly called for, and the fact that we had to get this sorted out in conch first meant that everything took a bit longer than we d hoped. In Twisted 15.5, we contributed support for several conch improvements: Between them and with some adjustments to the lazr.sshserver package we use to glue all this together to add support for DH group exchange, these are enough to allow us not to rely on SHA-1 at all, and these improvements have now been rolled out to all four endpoints listed above. I ve thus also uploaded OpenSSH 7.1 packages to Debian unstable. If you also run a Twisted-based SSH server, upgrade it now! Otherwise it will be harder for users of recent OpenSSH client versions to use your server, and for good reason.

21 April 2015

Manuel A. Fernandez Montecelo: About the Debian GNU/Linux port for OpenRISC or1k

In my previous post I mentioned my involvement with the OpenRISC or1k port. It was the technical activity in which I spent most time during 2014 (Debian and otherwise, day job aside). I thought that it would be nice to talk a bit about the port for people who don't know about it, and give an update for those who do know and care. So this post explains a bit how it came to be, details about its development, and finally the current status. It is going to be written as a rather personal account, for that matter, since I did not get involved enough in the OpenRISC community at large to learn much about its internal workings and aspects that I was not directly involved with. There is not much information about all of this elsewhere, only bits and pieces scattered here and there, but specially not much public information at all about the development of the Debian port. There is an OpenRISC entry in the Debian wiki, but it does not contain much information yet. Hopefully, this piece will help a bit to preserve history and give an insight for future porters. First Things First I imagine that most people reading this post will be familiar with the terminology, but just in case, to create a new Debian port means to get a Debian system (GNU/Linux variant, in this case) to run in the OpenRISC or1k computer architecture. Setting to one side all differences between hardware and software, and as described in their site:
The aim of the OpenRISC project is to create free and open source computing platforms
It is therefore a good match for the purposes of Debian and Free Software world in general. The processor has not been produced in silicon, or not available for the masses in any case. People with the necessary know-how can download the hardware description (Verilog) and synthesise it in a FPGA, or otherwise use simulators. It is not some piece of hardware that people can purchase yet, and there are no plans to mass-produce it in the near future either. The two people (including me) involved in this Debian port did not have the hardware, so we created the port entirely through cross-compiling from other architectures, and then compiling inside Qemu. In a sense, we were creating a Debian port for hardware that "does not [phisically] exist". The software that we built was tested by people who had hardware available in FPGA, though, so it was at least usable. I understand that people working in the arm64 port had to work similarly in the initial phases, working in the dark without access to real hardware to compile or test. The Spark The first time that I heard about the initiative to create the port was in late February of 2014, in a post which appeared in Linux Weekly News (sent by Paul Wise) and Slashdot. The original post announcing it was actually from late January, from Christian Svensson (blueCmd):
Some people know that I've been working on porting Glibc and doing some toolchain work. My evil master plan was to make a Debian port, and today I'm a happy hacker indeed! Below is a link to a screencast of me installing Debian for OpenRISC, installing python2.7 via apt-get (which you shouldn't do in or1ksim, it takes ages! (but it works!)) and running a small Python script. http://asciinema.org/a/7362
So, now, what can a Debian Hacker do when reading this? (Even if one's Hackery Level is not that high, as it is my case). And well, How Hard Can It Be? I mean, Really? Well, in my own defence, I knew that the answer to the last two questions would be a resounding Very . But for some reason the idea grabbed me and I couldn't help but think that it would be a Really Exciting Project, and that somehow I would like to get involved. So I wrote to Christian offering my help after considering it for a few days, around mid March, and he welcomed me aboard. The Ball Was Already Rolling Christian had already been in contact with the people behind DebianBootstrap, and he had already created the repository http://openrisc.debian.net/ with many packages of the base system and beyond (read: packages name_version_or1k.deb available to download and install). Still nowadays the packages are not signed with proper keys, though, so use your judgement if you want to try them. After a few weeks, I got up to speed with the status of the project and got my system working with the necessary tools. This meant basically sbuild/schroot to compile new packages, with the base system that Christian already got working, installed in a chroot, probably with the help of debootstrap, and qemu-system-or1k to simulate the system. Only a few of the packages were different from the version in Debian, like gcc, binutils or glibc -- they had not been upstreamed yet. sbuild ran through qemu-system-or1k, so the compilation of new packages could happen "natively" (running inside Qemu) rather than cross-compiling the packages, pulling _or1k.deb packages for dependencies from the repository that he had prepared, and _all.deb packages from snapshots.debian.org. I started by trying to get the packages that I [co-]maintain in Debian compiled for this architecture, creating the corresponding _or1k.deb. For most of them, though, I needed many dependencies compiled before I could even compile my packages. The GNU autotools / autoreconf Problem Since very early, many of the packages failed to build with messages such as:
Invalid configuration 'or1k-linux-gnu': machine 'or1k' not recognized
configure: error: /bin/bash ../config.sub or1k-linux-gnu failed
This means that software packages based on GNU autotools and using configure scripts need recent versions of the files config.sub and config.guess that they ship in their root directory, to be able to detect the architecture and generate the code accordingly. This is counter-intuitive, having into account that GNU autotools were designed to help with portability. Yet, in the case of creating new Debian ports, it meant that unless upstream had very recent versions of config. guess,sub , it would prevent the package to compile straight away in the new architectures -- even if invoking gcc without ado would have worked without problems in most cases for native compilation. Of course this did not only affect or1k, and there was already the autoreconf effort underway as a way to update these files automatically when building Debian packages, pushed by people porting Debian to the new architectures added in 2013/2014 (mips64el, arm64, ppc64el), which encountered the same roadblock. This affected around a thousand source packages in unstable. A Royal Pain. Also, all of their reverse dependencies (packages that depended on those to be built) could not be compiled straight away. The bugs were not Release Critical, though (none of these architectures were officially accepted at the time), so for people not concerned with the new ports there was no big incentive to get them fixed. This problem, which conceptually is easily solvable, prevented new ports to even attempt compile vast portions of the archive straight away (cleanly, without modifications to the package or to the host system), for weeks or months. The GNU autotools / autoreconf Solution We tackled this problem mainly in two ways. First, more useful for Debian in general, was to do as other porters were doing and submit bug reports and patches to Debian packages requesting them to use autoreconf, and NMUing packages (uploading changes to the archive without the official maintainers' intervention). A few NMUs were made for packages which had bug reports with patches available for a while, that were in the critical path to get many other packages compiled, and that were orphan or had almost no maintainer activity. The people working in the other new ports, and mainly Ubuntu people which helped with some of those ports and wanted to support them, had submitted a large amount of requests since late 2013, so there was no shortage of NMUs to be made. Some porters, not being Debian Developers, could not easily get the changes applied; so I also helped a bit the porters of other architectures, specially later on before the freeze of Jessie, to get as many packages compiled in those architectures as possible. The second way was to create dpkg-buildpackage hooks that updated unconditionally config. guess,sub before attempting to build the package in the local build system. This local and temporary solution allowed us in the or1k port to get many _or1k.deb packages in the experimental repository, which in turn would allow many more packages to compile. After a few weeks, I set up many sbuilds in a multi-core machine attempting to build uninterruptedly packages that were not previously built and which had their dependencies available. Every now and then (typically several times per day in peak times) I pushed the resulting _or1k.deb files to the repository, so more packages would have the necessary dependencies ready to attempt to build. Christian was doing something similar, and by April at peak times, among the two of us, we were compiling some days more than a hundred packages -- a huge amount of packages did not need any change other than up-to-date config. guess,sub files. At some point, late April, Christian set up wanna-build in a few hosts to do this more elegantly and smartly than my method, and more effectively as well. Ugly Hacks, Bugs and Shortcomings in the Toolchain and Qemu Some packages are extremely important because many other packages need them to compile (like cmake, Qt or GTK+), and they are themselves very complex and have dependency loops. They had deeper problems than the autoreconf issue and needed some seriously dirty hacking to get them built. To try to get as many packages compiled as possible, we sometimes compiled these important packages with some functionality disabled, disabling some binary packages (e.g. Java bindings) or specially disabling documentation (using DEB_BUILD_OPTIONS=nodoc when possible, and more aggressively when needed by removing chunks of debian/rules). I tried to use the more aggressive methods in as few packages as possible, though, about a dozen in total. We also used DEB_BUILD_OPTIONS=nocheck for speeding up compilation and avoiding build failures -- many packages' tests failed due to qemu-system-or1k not supporting multi-threading, which we could do nothing about at the time, but otherwise the packages mostly passed tests fine. Due to bugs and shortcomings in Qemu and the toolchain --like the compiler lacking atomics, missing functionality in glibc, Qemu entering in endless loops, or programs segfaulting (especially gettext, used by many packages and causing the packages failing to build)--, we had to resort to some very creative ways or time-consuming dull work to edit debian/rules, or to create wrappers of the real programs avoiding or forcing certain options (like gcc -O0, since -O2 made buggy binaries too often). To avoid having a mix of cleanly compiled and hacked packages in the same repository, Christian set up a two-tiered repository system -- the clean one and the dirty one. In the dirty one we dumped all of the packages that we got built, no matter how. The packages in the clean one could use packages from the dirty one to build, but they themselves were compiled without any hackery. Of course this was not a completely airtight solution, since they could contain code injected at build time from the "dirty repository" (e.g. by static linking), and perhaps other quirks. We hoped to get rid of these problems later by rebuilding all packages against clean builds of all their dependencies. In addition, Christian also spent significant amounts of time working inside the OpenRISC community, debugging problems, testing and recompiling special versions of the toolchain that we could use to advance in our compilation of the whole archive. There were other people in the OpenRISC community implementing the necessary bits in the toolchain, but I don't know the details. Good Progress Olof Kindgren wrote the OpenRISC health report April 2014 (actually posted in May), explaining the status at the time of projects in the broad OpenRISC community, and talking about the software side, Debian port included. Sadly, I think that there have been no more "health reports" since then. There was also a new post in Slashdot entitled OpenRISC Gains Atomic Operations and Multicore Support shortly thereafter. In the side of the Debian port, from time to time new versions of packages entered unstable and we started to use those newer versions. Some of them had nice fixes, like the autoreconf updates, so they did not require local modifications. In other cases, the new versions failed to build when old ones had worked (e.g. because the newer versions added support and dependencies of new versions of gnutls, systemd or other packages not yet available for or1k), and we had to repeat or create more nasty hacks to get the packages built again. But in general, progress was very good. There were about 10k arch-dependent packages in Debian at the time, and we got about half of them compiled by the beginning of May, counting clean and dirty. And, if I recall correctly, there were around the same number of arch=all (which can be installed in any architecture, after the package is built in one of them). Counting both, it meant that systems using or1k got about 15k packages available, or 75% of the whole Debian archive (at least "main", we excluded "contrib" and "non-free"). Not bad. By the middle to end of May, we had about 6k arch-dependent packages compiled, and 4k to go. The count of packages eventually reached ~6.6k at its peak (I think that in June/July). Many had been built with hacks and not rebuilt cleanly yet, but everything was fine until the amount of packages built plateaued. Plateauing There were multiple reasons for that. One of them is that after having fixed the autoreconf issue locally in some packages, new versions were uploaded to Debian without fixing that problem (in many cases there was no bug report or patch yet, so it was understandable; in other cases the requests were ignored). The wanna-build for the clean repository set up by Christian rightly considered the package out-of-date and prepared to build the more recent version, that failed. Then, other packages entering the unstable archive and build-depending on newer versions of those could not be built ("BD-Uninstallable"), until we built the newer versions of the dependencies in the dirty repository with local hacks. Consequently, the count of cleanly built packages went back-and-forth, when not backwards. More challenging was the fact that when creating a new port, language compilers which are written in that same language need to be built for that architecture first. Sometimes it is not the compiler, but compile-time or run-time support for modules of a language are not ported yet. Obviously, as with other dependencies, large amounts of packages written in those languages are bound to remain uncompiled for a long time. As Colin Watson explained in porting Haskell's GHC to arm64 and ppc64el, untangling some of the chicken-and-egg problems of language compilers for new ports is extremely challenging. Perl and Python are pretty much a pre-requisite of the base Debian system, and Christian got them working early on. But for example in May, 247 packages depended on r-base-dev (GNU R) for building, and 736 on ghc, and we did not have these dependencies compiled. Just counting those two, 1k source packages of the remaining 4k to 5k to be compiled for the new architecture would have to wait for a long time. Then there was Java, Mono, etc... Even more worrying problems were the pending issues with the toolchain, like atomics in glibc, or make check failing for some packages in the clean repository built with wanna-build. Christian continued to work on the toolchain and liasing with the rest of the OpenRISC community, I continued to request more changes to the Debian archive through a few requests to use autoreconf, and pushing a few more NMUs. Though many requests were attended, I soon got negative replies/reactions and backed off a bit. In the meantime, the porters of other new architectures at the time were mostly submitting requests to support them and not NMUing much either. Upstreaming Things continued more or less in the same state until the end of the summer. The new ports needed as many packages built as possible before the evaluation of which official ports to accept (in early September, I think, the final decision around the time of the freeze). Porters of the other new architectures (and maintainers, and other helpful Debian Developers) were by then more active in pushing for changes, specially remaining autoreconf issues, many of which benefited or1k. As I said before, I also kept pushing NMUs now and then, specially during summer, for packages which were not of immediate benefit for our port but helped the others (e.g. ppc64el needed updates to ltmain.sh for libtool which were not necessary for or1k, in addition to config. guess,sub ). In parallel in the or1k camp, there were patches that needed changes to be sent upstream, like for example Python's NumPy, that I submitted in May to the Debian package and upstream, and was uploaded to Debian in September with a new upstream release. Similar paths were followed between May and September for packages such as jemalloc, ocaml, gstreamer0.10, libgc, mesa, X.org's cf module and cmake (patch created by Christian). In April, Christian had reached the amazing milestone of tracking and getting all of the contributors of the port of GNU binutils to assign copyright to the Free Software Foundation (FSF), all of the work was refreshed and upstreamed. In July or August, he started to gather information about the contributors of the GCC port, which had started more than a decade ago. After that, nothing much happened (from the outside) until the end of the year, when Christian sent a message about the status of upstreaming GCC to the OpenRISC community. There was only one remaining person to assign the copyright to the FSF, but it was a blocker. In addition, there was the need to find one or more maintainers to liaise with upstream, review the patches, fix the remaining failures in the test suite and keeping the port in good shape. A few months after that and from what I could gather, the status remains the same. Current Status, and The Future? In terms of the Debian port, there have not been huge visible changes since the end of the summer, and not only because of the Jessie freeze. It seems that for this effort to keep going forward and be sustainable, sorting out the issues with GCC and Glibc is essential. That means having the toolchain completely pushed upstream and in good shape, and particularly completing the copyright assignment. Debian will not accept private forks of those essential packages without a very good reason even in unofficially supported ports; and from the point of view of porters, working in the remaining not-yet-built packages with continuing problems in the toolchain is very frustrating and time-consuming. Other than that, there is already a significant amount of software available that could run in an or1k system, so I think that overall the project has achieved a significant amount of success. Granted, KDE and LibreOffice are not available yet, neither are the tools depending on Haskell or Java. But a lot of software is available (including things high in the stack, like XFCE), and in many aspects it should provide a much more functional system that the one available in Linux (or other free software) systems in the late 1990s. If the blocking issues are sorted out in the near future, the effort needed to get a very functional port, on par with the unofficial Debian ports, should not be that big. In my opinion, and looking at the big picture, not bad at all for an architecture whose hardware implementation is not easy to come by, and in which the port was created almost solely with simulators. That it has been possible to get this far with such meagre resources, it's an amazing feat of Free Software and Debian in particular. As for the future, time will tell, as usual. I will try to keep you posted if there is any significant change in the future.

26 October 2014

Colin Watson: Moving on, but not too far

The Ubuntu Code of Conduct says:
Step down considerately: When somebody leaves or disengages from the project, we ask that they do so in a way that minimises disruption to the project. They should tell people they are leaving and take the proper steps to ensure that others can pick up where they left off.
I've been working on Ubuntu for over ten years now, almost right from the very start; I'm Canonical's employee #17 due to working out a notice period in my previous job, but I was one of the founding group of developers. I occasionally tell the story that Mark originally hired me mainly to work on what later became Launchpad Bugs due to my experience maintaining the Debian bug tracking system, but then not long afterwards Jeff Waugh got in touch and said "hey Colin, would you mind just sorting out some installable CD images for us?". This is where you imagine one of those movie time-lapse clocks ... At some point it became fairly clear that I was working on Ubuntu, and the bug system work fell to other people. Then, when Matt Zimmerman could no longer manage the entire Ubuntu team in Canonical by himself, Scott James Remnant and I stepped up to help him out. I did that for a couple of years, starting the Foundations team in the process. As the team grew I found that my interests really lay in hands-on development rather than in management, so I switched over to being the technical lead for Foundations, and have made my home there ever since. Over the years this has given me the opportunity to do all sorts of things, particularly working on our installers and on the GRUB boot loader, leading the development work on many of our archive maintenance tools, instituting the +1 maintenance effort and proposed-migration, and developing the Click package manager, and I've had the great pleasure of working with many exceptionally talented people. However. In recent months I've been feeling a general sense of malaise and what I've come to recognise with hindsight as the symptoms of approaching burnout. I've been working long hours for a long time, and while I can draw on a lot of experience by now, it's been getting harder to summon the enthusiasm and creativity to go with that. I have a wonderful wife, amazing children, and lovely friends, and I want to be able to spend a bit more time with them. After ten years doing the same kinds of things, I've accreted history with and responsibility for a lot of projects. One of the things I always loved about Foundations was that it's a broad church, covering a wide range of software and with a correspondingly wide range of opportunities; but, over time, this has made it difficult for me to focus on things that are important because there are so many areas where I might be called upon to help. I thought about simply stepping down from the technical lead position and remaining in the same team, but I decided that that wouldn't make enough of a difference to what matters to me. I need a clean break and an opportunity to reset my habits before I burn out for real. One of the things that has consistently held my interest through all of this has been making sure that the infrastructure for Ubuntu keeps running reliably and that other developers can work efficiently. As part of this, I've been able to do a lot of work over the years on Launchpad where it was a good fit with my remit: this has included significant performance improvements to archive publishing, moving most archive administration operations from excessively-privileged command-line operations to the webservice, making build cancellation reliable across the board, and moving live filesystem building from an unscalable ad-hoc collection of machines into the Launchpad build farm. The Launchpad development team has generally welcomed help with open arms, and in fact I joined the ~launchpad team last year. So, the logical next step for me is to make this informal involvement permanent. As such, at the end of this year I will be moving from Ubuntu Foundations to the Launchpad engineering team. This doesn't mean me leaving Ubuntu. Within Canonical, Launchpad development is currently organised under the Continuous Integration team, which is part of Ubuntu Engineering. I'll still be around in more or less the usual places and available for people to ask me questions. But I will in general be trying to reduce my involvement in Ubuntu proper to things that are closely related to the operation of Launchpad, and a small number of low-effort things that I'm interested enough in to find free time for them. I still need to sort out a lot of details, but it'll very likely involve me handing over project leadership of Click, drastically reducing my involvement in the installer, and looking for at least some help with boot loader work, among others. I don't expect my Debian involvement to change, and I may well find myself more motivated there now that it won't be so closely linked with my day job, although it's possible that I will pare some things back that I was mostly doing on Ubuntu's behalf. If you ask me for help with something over the next few months, expect me to be more likely to direct you to other people or suggest ways you can help yourself out, so that I can start disentangling myself from my current web of projects. Please contact me sooner or later if you're interested in helping out with any of the things I'm visible in right now, and we can see what makes sense. I'm looking forward to this!

15 April 2014

Colin Watson: Porting GHC: A Tale of Two Architectures

We had some requests to get GHC (the Glasgow Haskell Compiler) up and running on two new Ubuntu architectures: arm64, added in 13.10, and ppc64el, added in 14.04. This has been something of a saga, and has involved rather more late-night hacking than is probably good for me. Book the First: Recalled to a life of strange build systems You might not know it from the sheer bulk of uploads I do sometimes, but I actually don't speak a word of Haskell and it's not very high up my list of things to learn. But I am a pretty experienced build engineer, and I enjoy porting things to new architectures: I'm firmly of the belief that breadth of architecture support is a good way to shake out certain categories of issues in code, that it's worth doing aggressively across an entire distribution, and that, even if you don't think you need something now, new requirements have a habit of coming along when you least expect them and you might as well be prepared in advance. Furthermore, it annoys me when we have excessive noise in our build failure and proposed-migration output and I often put bits and pieces of spare time into gardening miscellaneous problems there, and at one point there was a lot of Haskell stuff on the list and it got a bit annoying to have to keep sending patches rather than just fixing things myself, and ... well, I ended up as probably the only non-Haskell-programmer on the Debian Haskell team and found myself fixing problems there in my free time. Life is a bit weird sometimes. Bootstrapping packages on a new architecture is a bit of a black art that only a fairly small number of relatively bitter and twisted people know very much about. Doing it in Ubuntu is specifically painful because we've always forbidden direct binary uploads: all binaries have to come from a build daemon. Compilers in particular often tend to be written in the language they compile, and it's not uncommon for them to build-depend on themselves: that is, you need a previous version of the compiler to build the compiler, stretching back to the dawn of time where somebody put things together with a big magnet or something. So how do you get started on a new architecture? Well, what we do in this case is we construct a binary somehow (usually involving cross-compilation) and insert it as a build-dependency for a proper build in Launchpad. The ability to do this is restricted to a small group of Canonical employees, partly because it's very easy to make mistakes and partly because things like the classic "Reflections on Trusting Trust" are in the backs of our minds somewhere. We have an iron rule for our own sanity that the injected build-dependencies must themselves have been built from the unmodified source package in Ubuntu, although there can be source modifications further back in the chain. Fortunately, we don't need to do this very often, but it does mean that as somebody who can do it I feel an obligation to try and unblock other people where I can. As far as constructing those build-dependencies goes, sometimes we look for binaries built by other distributions (particularly Debian), and that's pretty straightforward. In this case, though, these two architectures are pretty new and the Debian ports are only just getting going, and as far as I can tell none of the other distributions with active arm64 or ppc64el ports (or trivial name variants) has got as far as porting GHC yet. Well, OK. This was somewhere around the Christmas holidays and I had some time. Muggins here cracks his knuckles and decides to have a go at bootstrapping it from scratch. It can't be that hard, right? Not to mention that it was a blocker for over 600 entries on that build failure list I mentioned, which is definitely enough to make me sit up and take notice; we'd even had the odd customer request for it. Several attempts later and I was starting to doubt my sanity, not least for trying in the first place. We ship GHC 7.6, and upgrading to 7.8 is not a project I'd like to tackle until the much more experienced Haskell folks in Debian have switched to it in unstable. The porting documentation for 7.6 has bitrotted more or less beyond usability, and the corresponding documentation for 7.8 really isn't backportable to 7.6. I tried building 7.8 for ppc64el anyway, picking that on the basis that we had quicker hardware for it and didn't seem likely to be particularly more arduous than arm64 (ho ho), and I even got to the point of having a cross-built stage2 compiler (stage1, in the cross-building case, is a GHC binary that runs on your starting architecture and generates code for your target architecture) that I could copy over to a ppc64el box and try to use as the base for a fully-native build, but it segfaulted incomprehensibly just after spawning any child process. Compilers tend to do rather a lot, especially when they're built to use GCC to generate object code, so this was a pretty serious problem, and it resisted analysis. I poked at it for a while but didn't get anywhere, and I had other things to do so declared it a write-off and gave up. Book the Second: The golden thread of progress In March, another mailing list conversation prodded me into finding a blog entry by Karel Gardas on building GHC for arm64. This was enough to be worth another look, and indeed it turned out that (with some help from Karel in private mail) I was able to cross-build a compiler that actually worked and could be used to run a fully-native build that also worked. Of course this was 7.8, since as I mentioned cross-building 7.6 is unrealistically difficult unless you're considerably more of an expert on GHC's labyrinthine build system than I am. OK, no problem, right? Getting a GHC at all is the hard bit, and 7.8 must be at least as capable as 7.6, so it should be able to build 7.6 easily enough ... Not so much. What I'd missed here was that compiler engineers generally only care very much about building the compiler with older versions of itself, and if the language in question has any kind of deprecation cycle then the compiler itself is likely to be behind on various things compared to more typical code since it has to be buildable with older versions. This means that the removal of some deprecated interfaces from 7.8 posed a problem, as did some changes in certain primops that had gained an associated compatibility layer in 7.8 but nobody had gone back to put the corresponding compatibility layer into 7.6. GHC supports running Haskell code through the C preprocessor, and there's a __GLASGOW_HASKELL__ definition with the compiler's version number, so this was just a slog tracking down changes in git and adding #ifdef-guarded code that coped with the newer compiler (remembering that stage1 will be built with 7.8 and stage2 with stage1, i.e. 7.6, from the same source tree). More inscrutably, GHC has its own packaging system called Cabal which is also used by the compiler build process to determine which subpackages to build and how to link them against each other, and some crucial subpackages weren't being built: it looked like it was stuck on picking versions from "stage0" (i.e. the initial compiler used as an input to the whole process) when it should have been building its own. Eventually I figured out that this was because GHC's use of its packaging system hadn't anticipated this case, and was selecting the higher version of the ghc package itself from stage0 rather than the version it was about to build for itself, and thus never actually tried to build most of the compiler. Editing ghc_stage1_DEPS in ghc/stage1/package-data.mk after its initial generation sorted this out. One late night building round and round in circles for a while until I had something stable, and a Debian source upload to add basic support for the architecture name (and other changes which were a bit over the top in retrospect: I didn't need to touch the embedded copy of libffi, as we build with the system one), and I was able to feed this all into Launchpad and watch the builders munch away very satisfyingly at the Haskell library stack for a while. This was all interesting, and finally all that work was actually paying off in terms of getting to watch a slew of several hundred build failures vanish from arm64 (the final count was something like 640, I think). The fly in the ointment was that ppc64el was still blocked, as the problem there wasn't building 7.6, it was getting a working 7.8. But now I really did have other much more urgent things to do, so I figured I just wouldn't get to this by release time and stuck it on the figurative shelf. Book the Third: The track of a bug Then, last Friday, I cleared out my urgent pile and thought I'd have another quick look. (I get a bit obsessive about things like this that smell of "interesting intellectual puzzle".) slyfox on the #ghc IRC channel gave me some general debugging advice and, particularly usefully, a reduced example program that I could use to debug just the process-spawning problem without having to wade through noise from running the rest of the compiler. I reproduced the same problem there, and then found that the program crashed earlier (in stg_ap_0_fast, part of the run-time system) if I compiled it with +RTS -Da -RTS. I nailed it down to a small enough region of assembly that I could see all of the assembly, the source code, and an intermediate representation or two from the compiler, and then started meditating on what makes ppc64el special. You see, the vast majority of porting bugs come down to what I might call gross properties of the architecture. You have things like whether it's 32-bit or 64-bit, big-endian or little-endian, whether char is signed or unsigned, that sort of thing. There's a big table on the Debian wiki that handily summarises most of the important ones. Sometimes you have to deal with distribution-specific things like whether GL or GLES is used; often, especially for new variants of existing architectures, you have to cope with foolish configure scripts that think they can guess certain things from the architecture name and get it wrong (assuming that powerpc* means big-endian, for instance). We often have to update config.guess and config.sub, and on ppc64el we have the additional hassle of updating libtool macros too. But I've done a lot of this stuff and I'd accounted for everything I could think of. ppc64el is actually a lot like amd64 in terms of many of these porting-relevant properties, and not even that far off arm64 which I'd just successfully ported GHC to, so I couldn't be dealing with anything particularly obvious. There was some hand-written assembly which certainly could have been problematic, but I'd carefully checked that this wasn't being used by the "unregisterised" (no specialised machine dependencies, so relatively easy to port but not well-optimised) build I was using. A problem around spawning processes suggested a problem with SIGCHLD handling, but I ruled that out by slowing down the first child process that it spawned and using strace to confirm that SIGSEGV was the first signal received. What on earth was the problem? From some painstaking gdb work, one thing I eventually noticed was that stg_ap_0_fast's local stack appeared to be being corrupted by a function call, specifically a call to the colourfully-named debugBelch. Now, when IBM's toolchain engineers were putting together ppc64el based on ppc64, they took the opportunity to fix a number of problems with their ABI: there's an OpenJDK bug with a handy list of references. One of the things I noticed there was that there were some stack allocation optimisations in the new ABI, which affected functions that don't call any vararg functions and don't call any functions that take enough parameters that some of them have to be passed on the stack rather than in registers. debugBelch takes varargs: hmm. Now, the calling code isn't quite in C as such, but in a related dialect called "Cmm", a variant of C-- (yes, minus), that GHC uses to help bridge the gap between the functional world and its code generation, and which is compiled down to C by GHC. When importing C functions into Cmm, GHC generates prototypes for them, but it doesn't do enough parsing to work out the true prototype; instead, they all just get something like extern StgFunPtr f(void);. In most architectures you can get away with this, because the arguments get passed in the usual calling convention anyway and it all works out, but on ppc64el this means that the caller doesn't generate enough stack space and then the callee tries to save its varargs onto the stack in an area that in fact belongs to the caller, and suddenly everything goes south. Things were starting to make sense. Now, debugBelch is only used in optional debugging code; but runInteractiveProcess (the function associated with the initial round of failures) takes no fewer than twelve arguments, plenty to force some of them onto the stack. I poked around the GCC patch for this ABI change a bit and determined that it only optimised away the stack allocation if it had a full prototype for all the callees, so I guessed that changing those prototypes to extern StgFunPtr f(); might work: it's still technically wrong, not least because omitting the parameter list is an obsolescent feature in C11, but it's at least just omitting information about the parameter list rather than actively lying about it. I tweaked that and ran the cross-build from scratch again. Lo and behold, suddenly I had a working compiler, and I could go through the same build-7.6-using-7.8 procedure as with arm64, much more quickly this time now that I knew what I was doing. One upstream bug, one Debian upload, and several bootstrapping builds later, and GHC was up and running on another architecture in Launchpad. Success! Epilogue There's still more to do. I gather there may be a Google Summer of Code project in Linaro to write proper native code generation for GHC on arm64: this would make things a good deal faster, but also enable GHCi (the interpreter) and Template Haskell, and thus clear quite a few more build failures. Since there's already native code generation for ppc64 in GHC, getting it going for ppc64el would probably only be a couple of days' work at this point. But these are niceties by comparison, and I'm more than happy with what I got working for 14.04. The upshot of all of this is that I may be the first non-Haskell-programmer to ever port GHC to two entirely new architectures. I'm not sure if I gain much from that personally aside from a lot of lost sleep and being considered extremely strange. It has, however, been by far the most challenging set of packages I've ported, and a fascinating trip through some odd corners of build systems and undefined behaviour that I don't normally need to touch.

18 January 2014

Colin Watson: Testing wanted: GRUB 2.02~beta2 Debian/Ubuntu packages

This is mostly a repost of my ubuntu-devel mail for a wider audience, but see below for some additions. I'd like to upgrade to GRUB 2.02 for Ubuntu 14.04; it's currently in beta. This represents a year and a half of upstream development, and contains many new features, which you can see in the NEWS file. Obviously I want to be very careful with substantial upgrades to the default boot loader. So, I've put this in trusty-proposed, and filed a blocking bug to ensure that it doesn't reach trusty proper until it's had a reasonable amount of manual testing. If you are already using trusty and have some time to try this out, it would be very helpful to me. I suggest that you only attempt this if you're comfortable driving apt-get directly and recovering from errors at that level, and if you're willing to spend time working with me on narrowing down any problems that arise. Please ensure that you have rescue media to hand before starting testing. The simplest way to upgrade is to enable trusty-proposed, upgrade ONLY packages whose names start with "grub" (e.g. use apt-get dist-upgrade to show the full list, say no to the upgrade, and then pass all the relevant package names to apt-get install), and then (very important!) disable trusty-proposed again. Provided that there were no errors in this process, you should be safe to reboot. If there were errors, you should be able to downgrade back to 2.00-22 (or 1.27+2.00-22 in the case of grub-efi-amd64-signed). Please report your experiences (positive and negative) with this upgrade in the tracking bug. I'm particularly interested in systems that are complex in any way: UEFI Secure Boot, non-trivial disk setups, manual configuration, that kind of thing. If any of the problems you see are also ones you saw with earlier versions of GRUB, please identify those clearly, as I want to prioritise handling regressions over anything else. I've assigned myself to that bug to ensure that messages to it are filtered directly into my inbox. I'll add a couple of things that weren't in my ubuntu-devel mail. Firstly, this is all in Debian experimental as well (I do all the work in Debian and sync it across, so the grub2 source package in Ubuntu is a verbatim copy of the one in Debian these days). There are some configuration differences applied at build time, but a large fraction of test cases will apply equally well to both. I don't have a definite schedule for pushing this into jessie yet - I only just finished getting 2.00 in place there, and the release schedule gives me a bit more time - but I certainly want to ship jessie with 2.02 or newer, and any test feedback would be welcome. It's probably best to just e-mail feedback to me directly for now, or to the pkg-grub-devel list. Secondly, a couple of news sites have picked this up and run it as "Canonical intends to ship Ubuntu 14.04 LTS with a beta version of GRUB". This isn't in fact my intent at all. I'm doing this now because I think GRUB 2.02 will be ready in non-beta form in time for Ubuntu 14.04, and indeed that putting it in our development release will help to stabilise it; I'm an upstream GRUB developer too and I find the exposure of widely-used packages very helpful in that context. It will certainly be much easier to upgrade to a beta now and a final release later than it would be to try to jump from 2.00 to 2.02 in a month or two's time. Even if there's some unforeseen delay and 2.02 isn't released in time, though, I think nearly three months of stabilisation is still plenty to yield a boot loader that I'm comfortable with shipping in an LTS. I've been backporting a lot of changes to 2.00 and even 1.99, and, as ever for an actively-developed codebase, it gets harder and harder over time (in particular, I've spent longer than I'd like hunting down and backporting fixes for non-512-byte sector disks). While I can still manage it, I don't want to be supporting 2.00 for five more years after upstream has moved on; I don't think that would be in anyone's best interests. And I definitely want some of the new features which aren't sensibly backportable, such as several of the new platforms (ARM, ARM64, Xen) and various networking improvements; I can imagine a number of our users being interested in things like optional signature verification of files GRUB reads from disk, improved Mac support, and the TrueCrypt ISO loader, just to name a few. This should be a much stronger base for five-year support.

26 October 2012

Colin Watson: Automatic installability checking

I've just finished deploying automatic installability checking for Ubuntu's development release, which is more or less equivalent to the way that uploads are promoted from Debian unstable to testing. See my ubuntu-devel post and my ubuntu-devel-announce post for details. This now means that we'll be opening the archive for general development once glibc 2.16 packages are ready. I'm very excited about this because it's something I've wanted to do for a long, long time. In fact, back in 2004 when I had my very first telephone conversation with a certain spaceman about this crazy Debian-based project he wanted me to work on, I remember talking about Debian's testing migration system and some ways I thought it could be improved. I don't remember the details of that conversation any more and what I just deployed may well bear very little resemblance to it, but it should transform the extent to which our development release is continuously usable. The next step is to hook in autopkgtest results. This will allow us to do a degree of automatic testing of reverse-dependencies when we upgrade low-level libraries.

27 May 2012

Colin Watson: OpenSSH 6.0p1

OpenSSH 6.0p1 was released a little while back; this weekend I belatedly got round to uploading packages of it to Debian unstable and Ubuntu quantal. I was a bit delayed by needing to put together an improvement to privsep sandbox selection that particularly matters in the context of distributions. One of the experts on seccomp_filter has commented favourably on it, but I haven't yet had a comment from upstream themselves, so I may need to refine this depending on what they say. (This is a good example of how it matters that software is often not built on the system that it's going to run on, and in particular that the kernel version is rather likely to be different. Where possible it's always best to detect kernel capabilities at run-time rather than at build-time.) I didn't make it very clear in the changelog, but using the new seccomp_filter sandbox currently requires UsePrivilegeSeparation sandbox in sshd_config as well as a capable kernel. I won't change the default here in advance of upstream, who still consider privsep sandboxing experimental.

2 March 2012

Colin Watson: libpipeline 1.2.1 released

I've released libpipeline 1.2.1, and uploaded it to Debian unstable. This is a bug-fix release:

30 January 2012

Colin Watson: APT resolver bugs

I've managed to go for eleven years working on Debian and nearly eight on Ubuntu without ever needing to teach myself how APT's resolver works. I get the impression that there's a certain mystique about it in general (alternatively, I'm just the last person to figure this out). Recently, though, I had a couple of Ubuntu upgrade bugs to fix that turned out to be bugs in the resolver, and I thought it might be interesting to walk through the process of fixing them based on the Debug::pkgProblemResolver=true log files. Breakage with Breaks The first was Ubuntu bug #922485 (apt.log). To understand the log, you first need to know that APT makes up to ten passes of the resolver to attempt to fix broken dependencies by upgrading, removing, or holding back packages; if there are still broken packages after this point, it's generally because it's got itself stuck in some kind of loop, and it bails out rather than carrying on forever. The current pass number is shown in each "Investigating" log entry, so they start with "Investigating (0)" and carry on up to at most "Investigating (9)". Any packages that you see still being investigated on the tenth pass are probably something to do with whatever's going wrong. In this case, most packages have been resolved by the end of the fourth pass, but xserver-xorg-core is causing some trouble. (Not a particular surprise, as it's an important package with lots of relationships.) We can see that each breakage is:
Broken xserver-xorg-core:i386 Breaks on xserver-xorg-video-6 [ i386 ] < none > ( none )
This is a Breaks (a relatively new package relationship type introduced a few years ago as a sort of weaker form of Conflicts) on a virtual package, which means that in order to unpack xserver-xorg-core each package that provides xserver-xorg-video-6 must be deconfigured. Much like Conflicts, APT responds to this by upgrading providing packages to versions that don't provide the offending virtual package if it can, and otherwise removing them. We can see it doing just that in the log (some lines omitted):
Investigating (0) xserver-xorg-core [ i386 ] < 2:1.7.6-2ubuntu7.10 -> 2:1.11.3-0ubuntu8 > ( x11 )
  Fixing xserver-xorg-core:i386 via remove of xserver-xorg-video-tseng:i386
Investigating (1) xserver-xorg-core [ i386 ] < 2:1.7.6-2ubuntu7.10 -> 2:1.11.3-0ubuntu8 > ( x11 )
  Fixing xserver-xorg-core:i386 via remove of xserver-xorg-video-i740:i386
Investigating (2) xserver-xorg-core [ i386 ] < 2:1.7.6-2ubuntu7.10 -> 2:1.11.3-0ubuntu8 > ( x11 )
  Fixing xserver-xorg-core:i386 via remove of xserver-xorg-video-nv:i386
OK, so that makes sense - presumably upgrading those packages didn't help at the time. But look at the pass numbers. Rather than just fixing all the packages that provide xserver-xorg-video-6 in a single pass, which it would be perfectly able to do, it only fixes one per pass. This means that if a package Breaks a virtual package which is provided by more than ten installed packages, the resolver will fail to handle that situation. On inspection of the code, this was being handled correctly for Conflicts by carrying on through the list of possible targets for the dependency relation in that case, but apparently when Breaks support was implemented in APT this case was overlooked. The fix is to carry on through the list of possible targets for any "negative" dependency relation, not just Conflicts, and I've filed a patch as Debian bug #657695. My cup overfloweth The second bug I looked at was Ubuntu bug #917173 (apt.log). Just as in the previous case, we can see the resolver "running out of time" by reaching the end of the tenth pass with some dependencies still broken. This one is a lot less obvious, though. The last few entries clearly indicate that the resolver is stuck in a loop:
Investigating (8) dpkg [ i386 ] < 1.15.5.6ubuntu4.5 -> 1.16.1.2ubuntu5 > ( admin )
Broken dpkg:i386 Breaks on dpkg-dev [ i386 ] < 1.15.5.6ubuntu4.5 -> 1.16.1.2ubuntu5 > ( utils ) (< 1.15.8)
  Considering dpkg-dev:i386 29 as a solution to dpkg:i386 7205
  Upgrading dpkg-dev:i386 due to Breaks field in dpkg:i386
Investigating (8) dpkg-dev [ i386 ] < 1.15.5.6ubuntu4.5 -> 1.16.1.2ubuntu5 > ( utils )
Broken dpkg-dev:i386 Depends on libdpkg-perl [ i386 ] < none -> 1.16.1.2ubuntu5 > ( perl ) (= 1.16.1.2ubuntu5)
  Considering libdpkg-perl:i386 12 as a solution to dpkg-dev:i386 29
  Holding Back dpkg-dev:i386 rather than change libdpkg-perl:i386
Investigating (9) dpkg [ i386 ] < 1.15.5.6ubuntu4.5 -> 1.16.1.2ubuntu5 > ( admin )
Broken dpkg:i386 Breaks on dpkg-dev [ i386 ] < 1.15.5.6ubuntu4.5 -> 1.16.1.2ubuntu5 > ( utils ) (< 1.15.8)
  Considering dpkg-dev:i386 29 as a solution to dpkg:i386 7205
  Upgrading dpkg-dev:i386 due to Breaks field in dpkg:i386
Investigating (9) dpkg-dev [ i386 ] < 1.15.5.6ubuntu4.5 -> 1.16.1.2ubuntu5 > ( utils )
Broken dpkg-dev:i386 Depends on libdpkg-perl [ i386 ] < none -> 1.16.1.2ubuntu5 > ( perl ) (= 1.16.1.2ubuntu5)
  Considering libdpkg-perl:i386 12 as a solution to dpkg-dev:i386 29
  Holding Back dpkg-dev:i386 rather than change libdpkg-perl:i386
The new version of dpkg requires upgrading dpkg-dev, but it can't because of something wrong with libdpkg-perl. Following the breadcrumb trail back through the log, we find:
Investigating (1) libdpkg-perl [ i386 ] < none -> 1.16.1.2ubuntu5 > ( perl )
Broken libdpkg-perl:i386 Depends on perl [ i386 ] < 5.10.1-8ubuntu2.1 -> 5.14.2-6ubuntu1 > ( perl )
  Considering perl:i386 1472 as a solution to libdpkg-perl:i386 12
  Holding Back libdpkg-perl:i386 rather than change perl:i386
Investigating (1) perl [ i386 ] < 5.10.1-8ubuntu2.1 -> 5.14.2-6ubuntu1 > ( perl )
Broken perl:i386 Depends on perl-base [ i386 ] < 5.10.1-8ubuntu2.1 -> 5.14.2-6ubuntu1 > ( perl ) (= 5.14.2-6ubuntu1)
  Considering perl-base:i386 5806 as a solution to perl:i386 1472
  Removing perl:i386 rather than change perl-base:i386
Investigating (1) perl-base [ i386 ] < 5.10.1-8ubuntu2.1 -> 5.14.2-6ubuntu1 > ( perl )
Broken perl-base:i386 PreDepends on libc6 [ i386 ] < 2.11.1-0ubuntu7.8 -> 2.13-24ubuntu2 > ( libs ) (>= 2.11)
  Considering libc6:i386 -17473 as a solution to perl-base:i386 5806
  Added libc6:i386 to the remove list
Investigating (0) libc6 [ i386 ] < 2.11.1-0ubuntu7.8 -> 2.13-24ubuntu2 > ( libs )
Broken libc6:i386 Depends on libc-bin [ i386 ] < 2.11.1-0ubuntu7.8 -> 2.13-24ubuntu2 > ( libs ) (= 2.11.1-0ubuntu7.8)
  Considering libc-bin:i386 10358 as a solution to libc6:i386 -17473
  Removing libc6:i386 rather than change libc-bin:i386
So ultimately the problem is something to do with libc6; but what? As Steve Langasek said in the bug, libc6's dependencies have been very carefully structured, and surely we would have seen some hint of it elsewhere if they were wrong. At this point ideally I wanted to break out GDB or at the very least experiment a bit with apt-get, but due to some tedious local problems I hadn't been able to restore the apt-clone state file for this bug onto my system so that I could attack it directly. So I fell back on the last refuge of the frustrated debugger and sat and thought about it for a bit. Eventually I noticed something. The numbers after the package names in the third line of each of these log entries are "scores": roughly, the more important a package is, the higher its score should be. The function that calculates these is pkgProblemResolver::MakeScores() in apt-pkg/algorithms.cc. Reading this, I noticed that the various values added up to make each score are almost all provably positive, for example:
         Scores[I->ID] += abs(OldScores[D.ParentPkg()->ID]);
The only exceptions are an initial -1 or -2 points for Priority: optional or Priority: extra packages respectively, or some values that could theoretically be configured to be negative but weren't in this case. OK. So how come libc6 has such a huge negative score of -17473, when one would normally expect it to be an extremely powerful package with a large positive score? Oh. This is computer programming, not mathematics ... and each score is stored in a signed short, so in a sufficiently large upgrade all those bonus points add up to something larger than 32767 and everything goes haywire. Bingo. Make it an int instead - the number of installed packages is going to be on the order of tens of thousands at most, so it's not as though it'll make a substantial difference to the amount of memory used - and chances are everything will be fine. I've filed a patch as Debian bug #657732. I'd expected this to be a pretty challenging pair of bugs. While I certainly haven't lost any respect for the APT maintainers for dealing with this stuff regularly, it wasn't as bad as I thought. I'd expected to have to figure out how to retune some slightly out-of-balance heuristics and not really know whether I'd broken anything else in the process; but in the end both patches were very straightforward.

24 October 2011

Colin Watson: Quality in Ubuntu 12.04 LTS

As is natural for an LTS cycle, lots of people are thinking and talking about work focused on quality rather than features. With Canonical extending LTS support to five years on the desktop for 12.04, much of this is quite rightly focused on the desktop. I'm really not a desktop hacker in any way, shape, or form, though. I spent my first few years in Ubuntu working mainly on the installer - I still do, although I do some other things now too - and I used to say only half-jokingly that my job was done once X started. Of course there are plenty of bugs I can fix, but I wanted to see if I could do something with a bit more structure, so I got to thinking about projects we could work on at the foundations level that would make a big difference. Image build pipeline One difficulty we have is that quite a few of our bugs - especially installer bugs, although this goes for some other things too - are only really caught when people are doing coordinated image testing just before a milestone release. Now, it takes a while to do all the builds and then it takes a while to test them. The excellent work of the QA team has meant that testing is much quicker now than it used to be, and a certain amount of smoke-testing is automated (particularly for server images). On the other hand, the build phase has only got longer as we've added more flavours and architectures, particularly as some parts of the process are still serialised per architecture or subarchitecture so ARM builds in particular take a very long time indeed. Exact timings are a bit difficult to get for various reasons, but I think the minimum time between a developer uploading a fix and us having a full set of candidate images on all architectures including that fix is currently somewhere north of eight hours, and that's with people cutting corners and pulling strings which is a suboptimal thing to have to do around release time. This obviously makes us reluctant to respin for anything short of showstopper bugs. If we could get things down to something closer to two hours, respins would be a much less horrible proposition and so we might be able to fix a few bugs that are serious but not showstoppers, not to mention that the release team would feel less burned out. We discussed this problem at the release sprint, and came up with a laundry list of improvements; I've scheduled this for discussion at UDS in case we can think of any more. Please come along if you're interested! One thing in particular that I'm working on is refactoring Germinate, a tool which dates right back to our first meeting before Ubuntu was even called Ubuntu and whose job is to expand dependencies starting from our lists of "seed" packages; we use this, among other things, to generate Task fields in the archive and to decide which packages to copy into our images. This was acceptably quick in 2004, but now that we run it forty times (eight flavours multiplied by five architectures) at the end of every publisher run it's actually become rather a serious performance problem: cron.germinate takes about ten minutes, which is over a third of the typical publisher runtime. It parses Packages files eight times as often as it needs to, Sources files forty times as often as it needs to, and recalculates the dependency tree of the base system five times as often as it needs to. I am confident that we can significantly reduce the runtime here, and I think there's some hope that we might be able to move the publisher back to a 30-minute cycle, which would increase the velocity of Ubuntu development in general. Maintaining the development release Our release cycle always starts with syncing and merging packages from Debian unstable (or testing in the case of LTS cycles). The vast majority of packages in Ubuntu arrive this way, and generally speaking if we didn't do this we would fall behind in ways that would be difficult to recover from later. However, this does mean that we get a "big bang" of changes at the start of the cycle, and it takes a while for the archive to be usable again. Furthermore, even once we've taken care of this, we have a long-established rhythm where the first part of the cycle is mainly about feature development and the second part of the cycle is mainly about stabilisation. As a result, we've got used to the archive being fairly broken for the first few months, and we even tell people that they shouldn't expect things to work reliably until somewhere approaching beta. This makes some kind of sense from the inside. But how are you supposed to do feature development that relies on other things in the development release? In the first few years of Ubuntu, this question didn't matter very much. Nearly all the people doing serious feature development were themselves serious Ubuntu developers; they were capable of fixing problems in the development release as they went along, and while it got in their way a little bit it wasn't all that big a deal. Now, though, we have people focusing on things like Unity development, and we shouldn't assume that just because somebody is (say) an OpenGL expert or a window management expert that they should be able to recover from arbitrary failures in development release upgrades. One of the best things we could do to help the 12.04 desktop be more stable is to have the entire system be less unstable as we go along, so that developers further up the stack don't have to be distracted by things wobbling underneath them. Plus, it's just good software engineering to keep the basics working as you go along: it should always build, it should always install, it should always upgrade. Ubuntu is too big to do something like having everyone stop any time the build breaks, the way you might do in a smaller project, but we shouldn't let things slide for months either. I've been talking to Rick Spencer and the other Ubuntu engineering leads at Canonical about this. Canonical has a system of "rotations", where you can go off to another team for a while if you're in need of a change or want to branch out a bit; so I proposed that we allow our engineers to spend a month or two at a time on what I'm calling the +1 Maintenance Team, whose job is simply to keep the development release buildable, installable, and upgradeable at all times. Rick has been very receptive to this, and we're going to be running this as a trial throughout the 12.04 cycle, with probably about three people at a time. As well as being professional archive gardeners, these people will also work on developing infrastructure to help us keep better track of what we need to do. For instance, we could deploy better tools from Debian QA to help us track uninstallable packages, or we could enhance some of our many existing reports to have bug links and/or comment facilities, or we could spruce up the weather report; there are lots of things we could do to make our own lives easier. By 12.04, I would like, in no particular order: Of course, this overlaps to a certain extent with the kinds of things that the MOTU team have been doing for years, not to mention with what all developers should be doing to keep their own houses in reasonable order, and I'd like us to work together on this; we're trying to provide some extra hands here to make Ubuntu better for everyone, not take over! I would love this to be an opportunity to re-energise MOTU and bring some new people on board. I've registered a couple of blueprints (priorities, infrastructure) for discussion at UDS. These are deliberately open-ended skeleton sessions, and I'll try to make sure they're scheduled fairly early in the week, so that we have time for break-out sessions later on. If you're interested, please come along and give your feedback!

19 October 2011

Steve Langasek: Debian: not stale, just hardened

Rapha l Hertzog recently announced a new dpkg-buildflags interface in dpkg that at long last gives the distribution, the package maintainers, and users the control they want over the build flags used when building packages. The announcement mail gives all the gory details about how to invoke dpkg-buildflags in your build to be compliant; but the nice thing is, if you're using dh(1) with debian/compat=9, debhelper does it for you automatically so long as you're using a build system that it knows how to pass compiler flags to. So for the first time, /usr/share/doc/debhelper/examples/rules.tiny can now be used as-is to provide a policy-compliant package by default (setting -g -O2 or -g -O0 for your build regardless of how debian/rules is invoked). Of course, none of my packages actually work that way; among other things I have a habit of liberally sprinkling DEB_MAINT_CFLAGS_APPEND := -Wall in my rules, and sometimes DEB_LDFLAGS_MAINT_APPEND := -Wl,-z,defs and DEB_CFLAGS_MAINT_APPEND := $(shell getconf LFS_CFLAGS) as well. And my upstreams' build systems rarely work 100% out of the box with dhauto* without one override or another somewhere. So in practice, the shortest debian/rules file in any of my packages seems to be 13 lines currently. But that's 13 lines of almost 100% signal, unlike the bad old days of cut'n'pasted dh_* command lists. The biggest benefit, though, isn't in making it shorter to write a rules file with the old, standard build options. The biggest benefit is that dpkg-buildflags now also outputs build-hardening compiler and linker flags by default on Debian. Specifically, using the new interface lets you pick up all of these hardening flags for free:
-fstack-protector --param=ssp-buffer-size=4 -Wformat -Wformat-security -Werror=format-security -Wl,-z,relro
It also lets you get -fPIE and -Wl,-z,now by adding this one line to your debian/rules (assuming you're using dh(1) and compat 9):
export DEB_BUILD_MAINT_OPTIONS := hardening=+pie,+bindnow
Converting all my packages to use dh(1) has always been a long-term goal, but some packages are easier to convert than others. This was the tipping point for me, though. Even though debhelper compat level 9 isn't yet frozen, meaning there might still be other behavior changes to it that will make more work for me between now and release, over the past couple of weekends I've been systematically converting all my packages to use it with dh. In particular, pam and samba have been rebuilt to use the default hardening flags, and openldap uses these flags plus PIE support. (Samba already builds with PIE by default courtesy of upstream.) You can't really make samba and openldap out on the graph, but they're there (with their rules files reduced by 50% or more). I cannot overstate the significance of proactive hardening. There have been a number of vulnerabilities over the past few years that have been thwarted on Ubuntu because Ubuntu is using -fstack-protector by default. Debian has a great security team that responds quickly to these issues as soon as they're revealed, but we don't always get to find out about them before they're already being exploited in the wild. In this respect, Debian has lagged behind other distros. With dpkg-buildflags, we now have the tools to correct this. It's just a matter of getting packages to use the new interfaces. If you're a maintainer of a security sensitive package (such as a network-facing daemon or a setuid application), please enable dpkg-buildflags in your package for wheezy! (Preferably with PIE as well.) And if you don't maintain security sensitive packages, you can still help out with the hardening release goal.

6 October 2011

Colin Watson: Top ideas on Ubuntu Brainstorm (August 2011)

The Ubuntu Technical Board conducts a regular review of the most popular Ubuntu Brainstorm ideas (previous reviews conducted by Matt Zimmerman and Martin Pitt). This time it was my turn. Apologies for the late arrival of this review. Contact lens in the Unity Dash (#27584) Unity supports Lenses, which provide a consistent way for users to quickly search for information via the Dash. Current lenses include Applications, Files, and Music, but a number of people have asked for contacts to be accessible using the same interface. While Canonical's DX team isn't currently working on this for Ubuntu 11.10 or 12.04, we'd love somebody who's interested in this to get involved. Allison Randal explains how to get started, including some skeleton example code and several useful links. Displaying Ubuntu version information (#27460) Several people have asked for it to be more obvious what Ubuntu version they're running, as well as other general information about their system. John Lea, user experience architect on the Unity team, responds that in Ubuntu 11.10 the new LightDM greeter shows the Ubuntu version number, making that basic information very easily visible. For more detail, System Settings -> System Info provides a simple summary. Volume adjustments for headphone use (#27275) People often find that they need to adjust their sound volume when plugging in or removing headphones. It seems as though the computer ought to be able to remember this kind of thing and do it automatically; after all, a major goal of Ubuntu is to make the desktop Just Work. David Henningson, a member of Canonical's OEM Services group and an Ubuntu audio developer, responds on his blog with a summary of how PulseAudio jack detection has improved matters in Ubuntu 11.10, and what's left to do:
The good news: in the upcoming Ubuntu Oneiric (11.10), this is actually working. The bad news: it isn't working for everyone.
Making it easier to find software to handle a file (#28148) Ubuntu is not always as helpful as it could be when you don't have the right software installed to handle a particular file. Michael Vogt, one of the developers of the Ubuntu Software Center, responded to this. It seems that most of the pieces to make this work nicely are in place, but there are a few more bits of glue required:
Thanks a lot for this suggestion. I like the idea and it's something that software-center itself supports now. In the coming version 5.0 we will offer to "sort by top-rated" (based on the ratings&reviews data). It's also possible to search for an application based on its mime data. To search for a mime-type, you can enter "mime:text/html" or "mime:audio/ogg" into the search field. What is needed however is better integration into the file manager nautilus. I will make sure this gets attention at the next developer meeting and filed bug #860536 about it. In nautilus, there is now a button called "Find applications online" available as an option when opening an unknown file or when the user selects "open with...other application" in the context menu. But that will not use the data from software-center.
Show pop-up alert on low battery (#28037) Some users have reported on Brainstorm that they are not alerted frequently enough when their laptop's battery is low, as they clearly ought to be. This is an odd one, because there are already several power alert levels and this has been working well for us for some time. Nevertheless, enough people have voted for this idea that there must be something behind it, perhaps a bug that only affects certain systems. Martin Pitt, technical lead of the Ubuntu desktop team, has responded directly to the Brainstorm idea with a description of the current system and how to file a bug when it does not work as intended.

16 September 2011

Rapha&#235;l Hertzog: How to triage bugs in the Debian bug tracking system

Triaging bugs is one of the easiest way to start contributing to Debian. I ll teach you the basics in this article. 1. Prerequisites All interactions with the Debian Bug Tracking System (BTS) happen through email so you need to have an email account with an address that you re willing to make public. All the mail that you send to the BTS will be archived and publicly available through its web-interface. This also means that you should have some spam filters in place because it will inevitably be harvested by spammers. :-( To ensure that this email address is consistently used by the various tools that we re going to use, it s a good idea to put this email address in the DEBEMAIL environment variable. You can also specify your full name in DEBFULLNAME (in case you don t want to use the name associated with your Unix account). You usually do this by modifying ~/.bashrc (if you use bash as login shell):
export DEBEMAIL="hertzog@debian.org"
export DEBFULLNAME="Rapha l Hertzog"
You should also install the devscripts package, it provides the bts command that we re going to use. 2. Find a package or a team with too many bugs You can literally pick any popular software that s in Debian, they almost always get more bug reports than the maintainers can handle. Instead of picking a package, you can also select a packaging team and concentrate your efforts on the set of packages managed by the team. In any case, it s important to receive the bug traffic for the packages that you re going to work on. If you went for a specific package, you should subscribe to the package via the Package Tracking System (there s a subscribe box on the bottom left corner once you selected the source package of interest). If you decided to help a team, there s usually a dedicated mailing list receiving all bug traffic. You can browse a list of packages with the most bugs if you have troubles finding a package to work on. A stack of bug reports to triage 3. Triage bugs! Bug triaging is all about making sure that bugs are correctly classified so that when a developer looks at the bug list, he can quickly find bugs with all the information required to be able to fix them! 3.1 Adding information to bug Adding supplementary information is easily done just by sending a mail to XXXX@bugs.debian.org (replace XXXX with the bug number). But often you want to reply to a message in the bug history, in that case bts --mbox show XXXX is for you. It will grab the corresponding mailbox and open a mailer (mutt by default) on it. Now you can directly reply in your favorite mailer. 3.2 Classifying bugs The Debian BTS uses tags (click the link and read the doc!) to classify bugs. bts tag XXXX + foo will add the foo tag (replace the + with a to remove a tag). If you want to explain why you re adding a tag, you should instead reply in the bug log as explained above, put control@bugs.debian.org in Bcc (Blind Carbon Copy) and start the body of your message with your tag command:
tag XXXX + foo
thanks
But what tag should you add? When a bug is submitted, you should try to reproduce the bug. If you can reproduce it, then tag the bug confirmed (example in #641710). If you can t, you should request more information (ex: a sample document triggering the bug, a configuration file, the output of some relevant command, etc.) until you can reproduce it or conclude that it was a user mistake. When you request supplementary information due to this, you should tag the bug unreproducible moreinfo (example in #526774). moreinfo should be later dropped when the requested information are provided, and unreproducible should be dropped if those information were enough to actually help reproduce the bug (example in #526774). During that initial evaluation, it s also worth differentiating packaging bugs (which are specific to Debian) from upstream bugs (which are relevant also for non-Debian users). The latter should be tagged upstream (and forwarded upstream if the bug is reproducible or contains enough information for the upstream developers, example in #635112). If you saw a (viable) patch in the bug log, the bug should be tagged patch . This is usually done by the patch submitter but sometimes it s forgotten (example in #632460). Take care though to not reinstate the patch tag if it was initially set but then dropped by the package maintainer after a review of the patch. If the title of the bug report is not descriptive enough, you can change it with a retitle XXXX new-title command (example in #170850). You can also change the severity of the bug report depending on the impact of the problem (with a command severity XXXX new-severity , what a surprise!). Request for new features are wishlist , most documentation problems are minor . On the other side of the scale, you can use important for bugs that are very annoying but that should not block a release. serious , grave and critical are used for release critical bugs, check the official definitions of the severities (examples in 502738 or #506498). 3.3 Closing non-bugs and bugs that are already fixed If your analysis of the bug report is that it s not really a bug but a user mistake, then you should close it by sending a mail to XXXX-done@bugs.debian.org with some explanations of the user s mistake so that he can get past his problem (example in #592853). If the problem was a real bug, but one that is apparently already fixed, you should try to quickly find the version that fixed the bug. If you can t find it in the changelog (there s a link to it in the PTS, or you can use /usr/share/doc/package/changelog.Debian.gz), you ll make the safe assumption that the upstream version you re currently using is the first one where this is fixed. Then you send a mail to XXXX-done@bugs.debian.org but you start your mail with Version: version-that-fixed-the-bug and continue with a small explanation of why you believe the bug to be fixed by this version (example in #122948). 3.4 Reassigning misfiled bug reports Bug reports are not always filed against the proper package. Users file bugs against applications where they experience the bugs, but the real bug might be in an underlying library or application. When that happens, you should use the reassign XXXX correct-package version command to get it filed against the correct package. The version parameter is optional but should be provided if possible, it should be the oldest version that we know to have the problem (example in #626232). 3.5 Forwarding bugs Forwarding bugs means opening bug reports in the upstream bug tracker for issues that have been reported in Debian but that applies to the upstream (unmodified) source code. Be sure to include all the relevant information and a link to the corresponding Debian bug. Depending on the upstream bug tracker, you might have to open an account to be able to file new bug reports. On the Debian side, you must record that a bug has been forwarded with bts forwarded XXXX upstream-bug-url . upstream-bug-url is the URL corresponding to the upstream bug report you created (ex: http://projects.ciarang.com/p/feed2omb/issues/21/ recorded in #609345 ). If the upstream authors fix the bug you reported, you can tag the Debian bug with fixed-upstream so that it s easier to find bugs to close when the next upstream release comes out (example in #637275). 3.6 Updating version information The Debian BTS uses version tracking to know which package versions are affected by a given bug. It s particularly important to have correct version information for release critical bugs since it might affect the migration of packages to testing. You can learn more on this topic here: http://wiki.debian.org/HowtoUseBTS. 4. More advice Colin Watson wrote a constructive rant explaining some mistakes that bug triagers are often doing. While it refers mainly to Ubuntu s launchpad, the advice apply equally as well to Debian. Check it out to become a better bug triager! Note that you can refer to this article with this shorter URL: http://raphaelhertzog.com/go/bugtriaging/

Do you want to read more tutorials like this one? Click here to subscribe to my free newsletter, you can opt to receive future articles by email.

One comment Liked this article? Click here. My blog is Flattr-enabled.

11 September 2011

Gregor Herrmann: RC bugs 2011/36

after some months of looking only at "my" packages, I'm back in the #RCBW flow. here's my overview of RC bug activities & NMUs in the last week: two short observations:

30 June 2011

Robert Collins: dmraid (fakeraid) mirror + striped

While some folk look down on fakeraid (that is BIOS based RAID-until-OS-takes-over) solutions, I think they are pretty neat: they let a user get many of the benefits of dedicated controller cards at a fraction of the cost. The benefits include the usual ones for RAID more spindles to handle IO, tolerance of disk failures. And unlike pure LVM solutions, you can boot from a degraded RAID 1 / 5 / 10 set because the BIOS knows how. In some ways this is better than dedicated cards, because we have the software take over, so we can change the algorithms for IO dispatch all the way down to the individual devices :) However, these RAID volumes are in a pretty awkward spot for installers and bootloaders: inside a running Linux environment they look like software RAID which cannot be depended on for booting, but at boot time they look like hard disks which cannot be looked under the hood. I recently got a new desktop machine which has one of these motherboards, and fortuitously my old desktop I was replacing had the same size disks so I had 4 disks and the option of using a RAID setup. Apparently I m a sucker for punishment because I went for a RAID 10 (that is two RAID volumes made up of two-disk mirrors (the RAID 1 component), and then those two volumes are combined via striping (the RAID 0 component). This has the potential for pretty nice performance: in principle any read can come from one of 2 disks, and every 64KB (the stripe size) of linear data will switch to the other mirror set, giving a nice boost. Writes need to write to 2 disks always, but every 64KB worth of data will alternate mirror sets, also giving a boost. Sadly we (Ubuntu) aren t ready for this yet: there are two key bugs that make this layout almost impossible to install into. This blog post is for my exo-memory, I want to be able to figure out what I did next time around :) . Firstly parted_devices, a helper used by Ubiquity and debian-installer to determine which block devices are actually disk drives that one can partition and install onto, has a confused heuristic when dealing with dmraid it looks for devices which are not layered on other dmraid devices. This handily excludes partitions, but has the undesirable effect of excluding that striped device because it is layered on the two mirrored devices. Bug 560748 was filed about that, and I ve added a workaround to it basically disabling the filtering, so its not suitable as a long term fix, but it will let one select the RAID volume correctly. Secondly, grub2, which needs to figure out what the name at boot time of the RAID volume will be currently gets confused. I don t know enough to really explain and be correct in my explanation but I do have a fugly patch which worked for me. Bug 803658 tracks this defect. The basic approach I took was to say that dmraid devices should be an abstraction layer we don t peek under: if it claims to be a disk, well then its a disk. As grub does actually work that way - it talks to INT 13h the BIOS support for booting off of the RAID volume is entirely sufficient. Sadly neither bug is at the point where the patches can be rolled into Ubuntu itself, but the workaround should let folk get up and running. In both cases, build the package locally in the installer, install it, then after than run ubiquity and things should install. After the install, you will need to reapply the patch in the resulting installed environment, or things like update-grub will die on you! (huge thanks to cjwatson and ev for giving me some tips while I investigated this)

Next.

Previous.